Do POS Tags Help to Learn Better Morphological Segmentations?

نویسندگان

  • Kairit Sirts
  • Mark Johnson
چکیده

The utility of using morphological features in part-of-speech (POS) tagging is well established in the literature. However, the usefulness of exploiting information about POS tags for morphological segmentation is less clear. In this paper we study the POS-dependent morphological segmentation in the Adaptor Grammars framework. We experiment with three different scenarios: without POS tags, with gold-standard tags and with automatically induced tags, and show that the segmentation F1-score improves when the tags are used. We show that the gold-standard tags lead to the biggest improvement as expected. However, using automatically induced tags also brings some improvement over the tagindependent baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hierarchical Dirichlet Process Model for Joint Part-of-Speech and Morphology Induction

In this paper we present a fully unsupervised nonparametric Bayesian model that jointly induces POS tags and morphological segmentations. The model is essentially an infinite HMM that infers the number of states from data. Incorporating segmentation into the same model provides the morphological features to the system and eliminates the need to find them during preprocessing step. We show that ...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Automatic Morphological Analysis for Russian: a Comparative Study

In this paper we present a comparison of ten systems for automatic morphological analysis: TreeTagger, TnT, HunPos, Lapos, Citar, Morfette, Mystem, Pymorhy, Stanford POS tagger and SVMTool. Different training and tagging approaches are discussed together with the strengths and weaknesses of each system. Probabilistic taggers were trained and tested on the Russian National Disambiguated Corpus a...

متن کامل

Part-of-speech tagging of Modern Hebrew text

Words in Semitic texts often consist of a concatenation of word segments, each corresponding to a Part-of-Speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If...

متن کامل

Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

Sparsity is one of the major problems in natural language processing. The problem becomes even more severe in agglutinating languages that are highly prone to be inflected. We deal with sparsity in Turkish by adopting morphological features for part-of-speech tagging. We learn inflectional and derivational morpheme tags in Turkish by using conditional random fields (CRF) and we employ the morph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015